{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 扩增数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在shuffle-data中，数据扩增在train-dev上面，test数据用于做模型评估用，所以不进行扩增。\n",
    "\n",
    "扩增规则：\n",
    "\n",
    "> if A==B & A==C  then B==C \n",
    "\n",
    "只扩增正样本，负样本暂不进行处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 筛选并按照query1进行group分组，整理出list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>query1</th>\n",
       "      <th>query2</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血是什么原因？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后为什么会咯血？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血，应该怎么处理？</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血，需要就医吗？</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血，是否很严重？</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1396</th>\n",
       "      <td>肺炎</td>\n",
       "      <td>请问有肺炎该怎么治疗？</td>\n",
       "      <td>肺炎需要吸痰吗</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1397</th>\n",
       "      <td>支原体肺炎</td>\n",
       "      <td>得了小儿支原体肺炎应该注意什么？</td>\n",
       "      <td>小儿支原体肺炎有什么需要注意的？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1398</th>\n",
       "      <td>哮喘</td>\n",
       "      <td>急性支气管炎治疗方法？</td>\n",
       "      <td>急性支气管炎的治疗方法是什么？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1399</th>\n",
       "      <td>感冒</td>\n",
       "      <td>两种感冒药混吃会要命吗</td>\n",
       "      <td>两种感冒药混吃可以吗</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1400</th>\n",
       "      <td>胸膜炎</td>\n",
       "      <td>胸膜炎包裹性积液能吃抗核药吗</td>\n",
       "      <td>胸膜炎包裹性积液可否吃抗核药？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10148 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     category            query1            query2  label\n",
       "0          咳血     剧烈运动后咯血，是怎么了？     剧烈运动后咯血是什么原因？      1\n",
       "1          咳血     剧烈运动后咯血，是怎么了？      剧烈运动后为什么会咯血？      1\n",
       "2          咳血     剧烈运动后咯血，是怎么了？   剧烈运动后咯血，应该怎么处理？      0\n",
       "3          咳血     剧烈运动后咯血，是怎么了？    剧烈运动后咯血，需要就医吗？      0\n",
       "4          咳血     剧烈运动后咯血，是怎么了？    剧烈运动后咯血，是否很严重？      0\n",
       "...       ...               ...               ...    ...\n",
       "1396       肺炎       请问有肺炎该怎么治疗？           肺炎需要吸痰吗      0\n",
       "1397    支原体肺炎  得了小儿支原体肺炎应该注意什么？  小儿支原体肺炎有什么需要注意的？      1\n",
       "1398       哮喘       急性支气管炎治疗方法？   急性支气管炎的治疗方法是什么？      1\n",
       "1399       感冒       两种感冒药混吃会要命吗        两种感冒药混吃可以吗      1\n",
       "1400      胸膜炎    胸膜炎包裹性积液能吃抗核药吗   胸膜炎包裹性积液可否吃抗核药？      1\n",
       "\n",
       "[10148 rows x 4 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data = pd.read_csv(\"shuffle-data/train_data.csv\")\n",
    "dev_data = pd.read_csv(\"shuffle-data/dev_data.csv\")\n",
    "train = pd.concat([train_data,dev_data])\n",
    "train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_query_list(data_group):\n",
    "    l = []\n",
    "    for query in data_group:\n",
    "        l.append(query)\n",
    "    return l\n",
    "\n",
    "new = train.loc[train[\"label\"]==1,:].groupby(\"query1\")[\"query2\"].agg(get_query_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([list(['加湿器肺炎是怎样回事']),\n",
       "       list(['11个月大孩子老是流鼻涕怎么治', '宝宝11个月了，老是流鼻涕有什么办法']),\n",
       "       list(['肺炎13介疫苗需要打吗？', '有必要打13介肺炎疫苗吗？', '打13介肺炎疫苗管用吗？', '注射13介肺炎疫苗有作用吗？']),\n",
       "       ..., list(['鼻疽假单胞菌肺炎有怎样的治疗办法？', '鼻疽假单胞菌肺炎的治疗方法是什么？']),\n",
       "       list(['鼻子有血丝出来是鼻咽癌吗？', '回吸鼻涕有血丝，是鼻咽癌吗？', '鼻痛鼻涕有血丝是不是鼻癌', '鼻痛鼻涕有血丝跟鼻癌有什么关系吗']),\n",
       "       list(['吃了太多的龙鹿丸导致嗓子疼该怎么办', '龙鹿丸吃了嗓子疼是什么情况'])], dtype=object)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 在list中两两组合成pair"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_data = pd.DataFrame(columns=('category', 'query1', 'query2','label'))\n",
    "for line in new:\n",
    "    if len(line)>1:\n",
    "        content = line[0]\n",
    "        category = train.loc[train[\"query2\"]==content,\"category\"].values[0]\n",
    "        for idx1,q1 in enumerate(line):\n",
    "            for idx2 in range(idx1+1,len(line)):\n",
    "                query1 = line[idx1]\n",
    "                query2 = line[idx2]\n",
    "                new_data = new_data.append([{'category':category,'query1':query1,'query2':query2,'label':1}])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>query1</th>\n",
       "      <th>query2</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>感冒</td>\n",
       "      <td>11个月大孩子老是流鼻涕怎么治</td>\n",
       "      <td>宝宝11个月了，老是流鼻涕有什么办法</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>肺炎</td>\n",
       "      <td>肺炎13介疫苗需要打吗？</td>\n",
       "      <td>有必要打13介肺炎疫苗吗？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>肺炎</td>\n",
       "      <td>肺炎13介疫苗需要打吗？</td>\n",
       "      <td>打13介肺炎疫苗管用吗？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>肺炎</td>\n",
       "      <td>肺炎13介疫苗需要打吗？</td>\n",
       "      <td>注射13介肺炎疫苗有作用吗？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>肺炎</td>\n",
       "      <td>有必要打13介肺炎疫苗吗？</td>\n",
       "      <td>打13介肺炎疫苗管用吗？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>感冒</td>\n",
       "      <td>鼻子有血丝出来是鼻咽癌吗？</td>\n",
       "      <td>鼻痛鼻涕有血丝跟鼻癌有什么关系吗</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>感冒</td>\n",
       "      <td>回吸鼻涕有血丝，是鼻咽癌吗？</td>\n",
       "      <td>鼻痛鼻涕有血丝是不是鼻癌</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>感冒</td>\n",
       "      <td>回吸鼻涕有血丝，是鼻咽癌吗？</td>\n",
       "      <td>鼻痛鼻涕有血丝跟鼻癌有什么关系吗</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>感冒</td>\n",
       "      <td>鼻痛鼻涕有血丝是不是鼻癌</td>\n",
       "      <td>鼻痛鼻涕有血丝跟鼻癌有什么关系吗</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>上呼吸道感染</td>\n",
       "      <td>吃了太多的龙鹿丸导致嗓子疼该怎么办</td>\n",
       "      <td>龙鹿丸吃了嗓子疼是什么情况</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3251 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   category             query1              query2 label\n",
       "0        感冒    11个月大孩子老是流鼻涕怎么治  宝宝11个月了，老是流鼻涕有什么办法     1\n",
       "0        肺炎       肺炎13介疫苗需要打吗？       有必要打13介肺炎疫苗吗？     1\n",
       "0        肺炎       肺炎13介疫苗需要打吗？        打13介肺炎疫苗管用吗？     1\n",
       "0        肺炎       肺炎13介疫苗需要打吗？      注射13介肺炎疫苗有作用吗？     1\n",
       "0        肺炎      有必要打13介肺炎疫苗吗？        打13介肺炎疫苗管用吗？     1\n",
       "..      ...                ...                 ...   ...\n",
       "0        感冒      鼻子有血丝出来是鼻咽癌吗？    鼻痛鼻涕有血丝跟鼻癌有什么关系吗     1\n",
       "0        感冒     回吸鼻涕有血丝，是鼻咽癌吗？        鼻痛鼻涕有血丝是不是鼻癌     1\n",
       "0        感冒     回吸鼻涕有血丝，是鼻咽癌吗？    鼻痛鼻涕有血丝跟鼻癌有什么关系吗     1\n",
       "0        感冒       鼻痛鼻涕有血丝是不是鼻癌    鼻痛鼻涕有血丝跟鼻癌有什么关系吗     1\n",
       "0    上呼吸道感染  吃了太多的龙鹿丸导致嗓子疼该怎么办       龙鹿丸吃了嗓子疼是什么情况     1\n",
       "\n",
       "[3251 rows x 4 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，按照上述规则，可以扩增了3251条数据（无序，即不区分A-B与B-A）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 重新再切新的train和dev"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>query1</th>\n",
       "      <th>query2</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血是什么原因？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后为什么会咯血？</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血，应该怎么处理？</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血，需要就医吗？</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>咳血</td>\n",
       "      <td>剧烈运动后咯血，是怎么了？</td>\n",
       "      <td>剧烈运动后咯血，是否很严重？</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13394</th>\n",
       "      <td>感冒</td>\n",
       "      <td>鼻子有血丝出来是鼻咽癌吗？</td>\n",
       "      <td>鼻痛鼻涕有血丝跟鼻癌有什么关系吗</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13395</th>\n",
       "      <td>感冒</td>\n",
       "      <td>回吸鼻涕有血丝，是鼻咽癌吗？</td>\n",
       "      <td>鼻痛鼻涕有血丝是不是鼻癌</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13396</th>\n",
       "      <td>感冒</td>\n",
       "      <td>回吸鼻涕有血丝，是鼻咽癌吗？</td>\n",
       "      <td>鼻痛鼻涕有血丝跟鼻癌有什么关系吗</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13397</th>\n",
       "      <td>感冒</td>\n",
       "      <td>鼻痛鼻涕有血丝是不是鼻癌</td>\n",
       "      <td>鼻痛鼻涕有血丝跟鼻癌有什么关系吗</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13398</th>\n",
       "      <td>上呼吸道感染</td>\n",
       "      <td>吃了太多的龙鹿丸导致嗓子疼该怎么办</td>\n",
       "      <td>龙鹿丸吃了嗓子疼是什么情况</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>13399 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      category             query1            query2 label\n",
       "0           咳血      剧烈运动后咯血，是怎么了？     剧烈运动后咯血是什么原因？     1\n",
       "1           咳血      剧烈运动后咯血，是怎么了？      剧烈运动后为什么会咯血？     1\n",
       "2           咳血      剧烈运动后咯血，是怎么了？   剧烈运动后咯血，应该怎么处理？     0\n",
       "3           咳血      剧烈运动后咯血，是怎么了？    剧烈运动后咯血，需要就医吗？     0\n",
       "4           咳血      剧烈运动后咯血，是怎么了？    剧烈运动后咯血，是否很严重？     0\n",
       "...        ...                ...               ...   ...\n",
       "13394       感冒      鼻子有血丝出来是鼻咽癌吗？  鼻痛鼻涕有血丝跟鼻癌有什么关系吗     1\n",
       "13395       感冒     回吸鼻涕有血丝，是鼻咽癌吗？      鼻痛鼻涕有血丝是不是鼻癌     1\n",
       "13396       感冒     回吸鼻涕有血丝，是鼻咽癌吗？  鼻痛鼻涕有血丝跟鼻癌有什么关系吗     1\n",
       "13397       感冒       鼻痛鼻涕有血丝是不是鼻癌  鼻痛鼻涕有血丝跟鼻癌有什么关系吗     1\n",
       "13398   上呼吸道感染  吃了太多的龙鹿丸导致嗓子疼该怎么办     龙鹿丸吃了嗓子疼是什么情况     1\n",
       "\n",
       "[13399 rows x 4 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_train = pd.concat([train,new_data])\n",
    "new_train = new_train.reset_index()\n",
    "new_train = new_train.drop(labels=\"index\",axis = 1)\n",
    "new_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "train_index,dev_index,_,_ = train_test_split(new_train.index,new_train[\"label\"],\n",
    "                                             test_size=0.2,random_state=2020,\n",
    "                                             stratify=new_train[\"label\"])\n",
    "train_data = new_train.iloc[train_index,:]\n",
    "dev_data = new_train.iloc[dev_index,:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 10719 entries, 3519 to 3091\n",
      "Data columns (total 4 columns):\n",
      " #   Column    Non-Null Count  Dtype \n",
      "---  ------    --------------  ----- \n",
      " 0   category  10719 non-null  object\n",
      " 1   query1    10719 non-null  object\n",
      " 2   query2    10719 non-null  object\n",
      " 3   label     10719 non-null  object\n",
      "dtypes: object(4)\n",
      "memory usage: 418.7+ KB\n"
     ]
    }
   ],
   "source": [
    "train_data.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 2680 entries, 12486 to 1440\n",
      "Data columns (total 4 columns):\n",
      " #   Column    Non-Null Count  Dtype \n",
      "---  ------    --------------  ----- \n",
      " 0   category  2680 non-null   object\n",
      " 1   query1    2680 non-null   object\n",
      " 2   query2    2680 non-null   object\n",
      " 3   label     2680 non-null   object\n",
      "dtypes: object(4)\n",
      "memory usage: 104.7+ KB\n"
     ]
    }
   ],
   "source": [
    "dev_data.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data.to_csv(\"augment-data/train_data.csv\",index=False)\n",
    "dev_data.to_csv(\"augment-data/dev_data.csv\",index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tf2_py37",
   "language": "python",
   "name": "tf2_py37"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
